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ABSTRACT 


A  technique  for  fusing  Kalman  filter  information  has  been  developed  by  Jeffrey  Uhlmann,  Simon 
Julier,  et.  al.  that  addresses  the  problems  that  arise  from  fusing  correlated  measurements.  The  researchers 
have  named  this  technique  “covariance  intersection”  and  have  presented  papers  on  it  at  several  robotics 
and  control  theory  conferences.  The  technique  is  applicable  to  these  areas  because  robotic  systems  often 
have  data  flowing  between  multiple  interconnected  algorithms  with  no  guarantee  that  the  data  flowing 
into  any  algorithm  are  independent. 

It  can  be  shown  that  the  covariance  intersection  technique  is  a  log-linear  combination  of  two 
Gaussian  functions  and  is  thus  related  to  Chemoff  information.  Given  this  relationship,  covariance 
intersection  can  be  generalized  to  the  fusion  of  any  two  probability  density  functions.  One  of  the  selection 
criteria  suggested  by  the  developers  for  the  optimal  combination  of  two  Gaussian  functions  is  the 
minimization  of  the  determinant  of  the  fused  covariance,  which  is  equivalent  to  the  minimization  of  the 
Shannon  information  of  the  fused  state.  This  equivalence  justifies  the  selection  of  the  determinant 
criterion  for  may  applications  of  covariance  intersection.  Given  the  recognition  of  a  more  general  rule  for 
the  covariance  intersection  technique,  other  probabilistic  measures,  such  as  the  Chemoff  information, 
may  be  appropriate  for  other  fusion  applications. 


iii 


TABLE  OF  CONTENTS 


Page 


Abstract  iii 

List  of  Illustrations  vii 

1.  INTRODUCTION  1 

2.  COVARIANCE  INTERSECTION  3 

3.  GENERALIZED  FUSION  5 

3.1  The  Minimization  Criterion  6 

4.  OTHER  MINIMIZATION  CRITERIA  9 

5.  THREE-DIMENSIONAL  EXAMPLE  11 

6.  SUMMARY  15 

REFERENCES  17 


V 


LIST  OF  ILLUSTRATIONS 


Figure  Page 

No. 

1  Shannon  information  contour.  12 

2  Log-linear  chord  between  two  points.  12 

3  Information  surface  for  a  plane  in  log-probability  space.  13 


1.  INTRODUCTION 


In  the  1990s,  Jeffrey  Uhlmann,  Simon  Julier  and  their  associates  began  promoting  a  data  fusion 
technique  that  they  have  termed  “covariance  intersection”  [1,2,3,4].  Their  principal  application  of 
covariance  intersection  has  been  as  an  adjunct  to  Kalman  filters  when  input  data  into  Kalman  filters  are 
potentially  highly  correlated,  as  is  often  the  case  in  complex  control  systems.  Primary  uses  have  focused 
on  robotics  applications,  although  some  tests  have  been  conducted  for  fusing  target  tracks  [5]. 

The  data  fusion  technique  is  claimed  to  be  applicable  to  the  fusion  of  sensor  measurements,  data 
estimates,  or  similar  quantities  that  can  be  described  in  terms  of  a  Gaussian  probability  density  function.  It 
will  be  shown  that  the  covariance  intersection  technique  is  related  to  a  more  general  data  fusion  technique 
that  can  fuse  any  pair  of  related  probability  density  functions.  This  association  suggests  that  a  strict 
interpretation  of  the  covariance  intersection  technique  is  that  it  only  fuses  probability  density  functions 
and  not  measurements,  and  only  fuses  state  estimates  in  the  sense  that  these  estimates  are  represented  by 
probability  density  functions.  This  paper  presents  an  overview  of  the  covariance  intersection  technique, 
the  generalization  of  the  technique,  the  information  theoretic  associations,  and  an  example  of  the 
application  of  the  generalized  fusion  technique  to  a  simple  probabilistic  system. 
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2.  COVARIANCE  INTERSECTION 


The  covariance  intersection  technique  is  based  upon  the  assumption  that  measurements  or  states  can 
be  described  with  Gaussian  probability  density  functions.  The  general  problem  with  the  fusion  of  two 
probability  density  functions  is  that  the  two  functions  may  have  been  estimated  from  shared 
measurements  and  therefore  are  coupled.  The  fusion  of  independent  density  functions  is  straightforward 
and  widely  recognized  as 

C*  =  A”’  + 

c  =  c(A''a  +  B-^b) 

where  a,  b ,  and  c  are  the  statistical  means  and  A,  B ,  and  C  are  the  covariances.  The 
appropriate  application  of  this  fusion  rule  is  for  the  estimation  of  a  probability  density  function  that 
describes  two  independent  sets  of  measurements  assumed  to  be  identically  distributed  according  to  a 
single  probability  density  function. 

Uhlmann,  et  al.  suggest  that  when  coupling  between  probability  density  functions  is  likely  and  the 
degree  of  coupling  is  unknown  that  a  linear  combination  of  the  sigma  contour 

be  used,  where 

fc  (^) <  max(/^  (x).  fs  (x)) ,  and 

fc{x)<(qf^{x)+{l-o))fB{x).  (1) 

Expansion  of  Equation  1  gives, 

(x  -  c)^  C"^  (x — c)  <  <w(x  -  A"*  (x  -  a)+  (l  -  £u)(x  -  B"*  (x  -  fo) 

and  can  be  satisfied  by 

C"‘  =gtA‘*  +(1-G7)B”‘ 

c  =  c(£0A”’a  +  (L-tiT)B~^b) 
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The  remaining  task  is  to  select  an  appropriate  value  for  the  mixing  parameter  cr  through  the 
optimization  of  chosen  criteria.  Two  criteria  that  have  been  suggested  include  the  minimization  of  the 
determinant  of  the  fused  covariance,  and  the  minimization  of  the  trace  of  the  fused  covariance.  It  will  be 
shown  that  minimization  of  the  determinant  results  in  a  minimization  of  the  entropy  of  the  fused  density 
function  and  is  the  criterion  suggested  by  information  theory. 
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3.  GENERALIZED  FUSION 


Further  justification  for  covariance  intersection  technique  can  be  obtained  by  examining  the 
equation  used  to  calculate  Chemoff  information, 


(2) 


This  function  constructs  a  fused  probability  density  function  from  a  log-linear  combination  of  two 
probability  density  functions,  followed  with  a  renormalization  of  the  combination.  Equation  2  becomes 

-m{x-aY  A-'  {x-a)/2  -{l-tn  B"'  (x-b)/2 

Pc{x)  =  - - •  (3) 

C  ^~is{x-aY  A-‘{x~a)/2^-(l-t!!){x-bf  B-'  {x-b)l  2  ^ 


for  Gaussian  functions,  where  the  normalization  terms  that  appear  in  the  numerator  and 
denominator  cancel.  The  exponential  term, 

- 1  /  2  =  A"*  (jc  -  a)+  (l  -crX^i:  -  (a:  -&))/  2 

can  be  rewritten  as 

t  (niA‘‘  -I-  (l  )x  -  (a’’ A"W  +  (l  -gt))c -  {pjA~^a  +  (l 

+  a^mA~^a  +  b^{[-  m)B~^b. 

The  terms  in  x  can  be  gathered  into  a  quadratic  form  with  the  substitution 

C"‘  =ctA”'  -i-(l-tir)5’* 
c  =  c{(oA-'a  +  (i-Vj)B-^b\ 

exactly  the  same  fusion  formula  as  suggested  by  Uhlmann.  The  substitution  of  the  defined  variables 
results  in 

t  ~  x^C~^x  -  c^C~^x  -  x^C’^c  +  a^vjA^'^a  4- 
=  (x  -  cY  C"^  (x  -  c) -  c^C^c  +  a^tuA'‘^a  +  b^  (l  -m)B~^b. 
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The  terms  independent  of  x  in  the  above  equation  could  be  expanded,  but  these  terms  cancel  in  the 
numerator  and  denominator  of  Equation  3  and  need  not  be  considered  further.  The  integration  of  the 
exponential  terms  containing  x  in  the  denominator  is  widely  know  and  results  in 


Pci^)- 


{27t)‘ 


.n/2 


|C| 


a  Gaussian  density  function. 

Thus,  the  covariance  intersection  technique  selects  a  fused  probability  density  function  that  is  a  log- 
linear  combination  of  two  initial  probability  density  functions.  The  advantage  obtained  with  Gaussian 
functions  is  that  the  fusion  of  two  Gaussian  functions  results  in  a  Gaussian  function.  This  is  not  true  in 
general  for  the  fusion  of  density  functions  with  the  described  technique.  However,  there  are  other  families 
of  density  functions  that  possess  this  property,  such  as  functions  that  are  members  of  the  statistical 
exponential  families  [6]  of  which  Gaussian  functions  are  members.  Further  research  might  show  that 
other  functions  from  the  exponential  families  may  be  of  interest  for  applications  of  the  generalized 
covariance  intersection  technique. 


3.1  THE  MINIMIZATION  CRITERION 

Given  the  fusion  rule,  the  next  item  to  consider  is  the  selection  of  the  mixing  parameter  CO .  One 
possible  criterion  is  to  minimize  the  Shannon  information  of  the  fused  Gaussian  function  because 
Shannon  information  is  a  measure  of  the  amount  of  information  remaining  to  be  extracted  from  a  system 
under  observation.  The  Shannon  information  of  a  probability  density  function  is 

=  -J  Pc  (^)ln(Pc  (x)}^ 


and  for  a  Gaussian  function  is 


g-{x-cf  C-'{x-c)/2 


Idx, 


which  expands  to 


/.  - - \ 1  C-'(x-c)l2i , 


f 


In 
V  V 


l<^r  -{x-cj C-^{x-c)ll 


\ix. 
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The  integration  involving  the  logarithmic  term  is  the  integral  of  the  Gaussian  probability  density 
function  which,  including  the  normalization  constant,  integrates  to  1,  giving 


h  =-W 


V\--/  j 

Gradshteyn  and  Ryzhik  [7]  provide  the  integral  solution 


(a-" -i)!  F 


jx^'”e-’’^'dx  = 


which  gives  the  solution  to  the  integral  of  the  second  term  in  Equation  3.  The  integration  limits  in 
Equation  4  allow  for  the  substitution  x'  =  x  —  c  without  change  to  the  form  of  the  integral.  Ignoring  the 
prime  symbol  in  the  substitution,  the  solution  to  the  integral  of  a  principal  axis  of  the  covariance  matrix 
can  be  written  as 


J txp(-qy^ /2)dy  = 


(5) 


Decomposition  of  the  covariance  matrix  into  principal  components  allows  the  integral  in  Equation  4 
to  be  written  as 


I  ((4  -cfC-'lx-  c)/  =  iff  S  9,  V'‘P<-E9.  yl '  2)*’  ■ 


The  solution  to  the  integral  is  thus 


exp(-5^^,y,^/2)^fy  =  ^(2;r)'’'^|C|'''^ 


and  the  full-space  integration  leads  to  the  Shannon  information  being 

h  =^+iln((2s)'|C|), 


where  n  is  the  number  of  dimensions. 
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The  Shannon  information  of  a  Gaussian  function  is  therefore  related  to  the  determinant  of  the 
covariance.  Minimization  of  the  Shannon  information  of  the  fused  Gaussian  function  is  equivalent  to  the 
minimization  of  the  determinant  of  the  covariance  C ,  which  is  accomplished  through  the  appropriate 
choice  of  the  mixing  parameter  (O . 

A  point  to  note  is  that  Shannon  information  is  a  convex  function  for  the  family  of  Gaussian 
functions  and  therefore  the  maxima  are  at  the  ends  of  the  chord  for  the  covariance  intersection  technique 
in  combination  with  the  Shannon  information  criterion.[8]  No  local  maxima  are  possible  on  the  chord 
between  a  pair  of  Gaussian  functions  other  than  at  the  end  points. 


Weak  associations  between  the  Shannon  information  criterion  and  other  suggested  minimization 
criteria  may  be  found.  The  inequality  [9] 


|C|<fltr(c) 


n 


provides  an  indication  as  to  why  the  trace  operation  would  appear  to  work  as  a  minimization 
criterion  for  covariance  intersection.  Hadamard's  inequality  also  provides  a  second  minimization  criterion 
that  would  appear  to  work, 

i 


The  minima  for  these  functions  may  not  be  identical  to  the  minimum  of  the  determinant. 
Justification  for  the  use  of  these  alternative  minimization  functions  is  not  as  strong  as  in  the  case  of  the 
Shannon  information  minimization  and  it  could  be  argued  that  the  apparent  performance  of  these 
minimization  criteria  are  due  to  their  inequalities  in  relationship  to  the  determinant  minimization. 
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4.  OTHER  MINIMIZATION  CRITERIA 


Given  the  relationship  between  the  Chemoff  information  equation  and  the  covariance  intersection 
technique,  Chemoff  information  may  be  suitable  for  use  as  another  minimization  criterion  for  some 
applications.  The  applications  most  suitable  for  this  purpose  is  the  fusion  of  probability  density  functions 
where  the  estimates  converge  toward  a  fixed  density  function  as  the  number  of  estimates  increase. 
Chemoffs  theorem  as  reformulated  by  Cover  and  Thomas’  is  “the  best  achievable  exponent  in  the 
Bayesian  probability  of  error  is  D* ,  where 

d’=o(p..||p,)=d(cII^’2). 

with 


p  / 


ji 


and  w*  the  value  of  vs  such  that 

i|p.)=p>(p.-  II  pJ- 


where  the  relative  entropy  or  Kullback-Leibler  distance  is 


f 

d{p  II  ^)=JpWlog 


<l{x)  j 


Chemoff  information  can  be  used  to  select  a  probability  density  function  that  minimizes  the 
Bayesian  error.  Minimization  of  the  Shannon  information  will  not  in  general  provide  the  same  solution  as 
the  minimum  Bayesian  error  solution. 

An  overlooked  assunq)tion  of  covariance  intersection  is  that  the  number  of  measurements  that  were 
used  to  estimate  the  two  density  functions  are  unknown  and  assumed  to  be  equal.  If  the  number  of 
measurements  that  were  used  to  derive  the  probability  density  functions  are  known,  the  Chemoff 
information  can  be  modified  to  account  for  the  difference  in  error  probabilities  between  the  two  functions. 
Sanov's  Theorem  provides  this  connection: 

“Let  X^,X2,...,X  „  be  i.i.d.  ~  Q{x).  Let  .E  C  P  be  a  set  of  probability  distributions.  Then 
<2 "  (P)  =  (2"  n  P„ )  <  (n  +  , 
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where 


P*  =argimnD(Pj|0), 

is  the  distribution  in  E  that  is  closest  to  Q  in  relative  entropy.”  ® 

By  Sanov's  theorem,  the  associated  probabilities  of  error  are 

and 

P2 

and  the  total  probability  of  error  is 

Pg{A^^=  ^  + 

The  exponential  rate  is  determined  by  the  worst  exponent,  and  the  maximum  value  of  the  minimum 
is  obtained  when  the  two  terms  are  equal.  For  this  application,  we  choose  m  so  that 

nD{P^\\P,)  =  mD{P^\\P2). 

Knowledge  of  the  number  of  measurements  that  were  used  to  estimate  the  probability  density 
function  modifies  the  selection  of  the  fused  probability  density  function  toward  the  distribution  with  more 
measurements. 
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5.  THREE-DIMENSIONAL  EXAMPLE 


The  generalization  of  covariance  intersection  not  only  works  for  Gaussian  functions,  but  for  the 
simple  case  of  a  three-dimensional  probability  space.  The  two-dimensional  probability  space  is  simpler 
but  less  interesting;  the  Shannon  information  minimization  criterion  selects  the  member  of  the  pair  with 
the  least  information  instead  of  an  intermediate  point  between  the  two.  Figure  1  shows  contoms  of  equal 
information  on  the  probability  simplex  in  probability  space.  Figure  2  shows  the  information  contours,  the 
log-linear  chord  between  two  points  on  the  probability  surface,  and  the  location  of  the  information 
minimum  for  the  pair  of  points. 

Figure  3  shows  a  plane  in  a  log-probability  space.  The  log-probability  space  is  of  interest  because 
the  intermediate  probabilities  between  two  probability  vectors  lie  along  a  line.  The  conversion  from 
probability  space  to  log-probability  space  is 

(li  =-ln(p,) 


and  the  inverse  operation  is 

_  exp(-^,) 

5;exp(-^,.) 

i 

where  normalization  is  required  for  the  reverse  mapping  to  probability  space.  The  mapping  from 
log  space  to  probability  space  is  an  infinite-to-one  mapping  with  the  identity 

where  a  is  any  real  number  and  the  vector  1  =  (1  1  1)  for  the  three-dimensional  space.  Figure  3 
shows  the  Shannon  information  for  a  plane  in  the  three-dimensional  log-probability  space  with  the  normal 
vector,  1 .  Examination  of  the  Shannon  information,  shown  as  a  projection  out  of  the  plane,  reveals  that 
the  chord  between  two  points  in  log-probability  space  can  have  both  minima  and  maxima.  It  is  possible  to 
have  up  to  two  local  maxima  and  up  to  three  local  minima  (two  at  the  ends)  along  the  chord  between  two 
points  for  the  three-dimensional  space.  The  minima  lie  in  three  troughs  located  below  the  three  gray 
arrows  in  Figure  3.  The  simple  rule  of  selecting  the  Shannon  information  minimum  from  covariance 
intersection  might  be  replaced  with  more  complex  rules  in  certain  applications.  The  existence  of  a  local 
maximum  on  the  chord  indicates  that  the  two  points  represent  probability  density  fimctions  that  are  in 
conflict.  In  this  case,  one  possible  rule  set  might  be  to  select  the  information  maximum  between  the  two 
points  as  the  fused  log  probability  point  when  the  two  points  are  local  minima.  For  pairs  with  two  maxima 
and  three  minima,  the  rule  might  be  to  select  the  central  minimum  if  it  is  less  than  either  of  the  end  points, 
otherwise  select  the  point  with  the  global  maximum  on  the  line  segment.  Other  rule  sets  are  possible,  and 
further  research  might  show  which  sets  are  reasonable  for  different  applications. 
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Figure  1.  Shannon  information  contour. 


Figure  2.  Log-linear  chord  between  two  points. 


Figure  3.  Information  surface  for  a  plane  in  log-probability  space. 


6.  SUMMARY 


It  has  been  shown  that  the  covariance  intersection  technique  is  related  to  a  more  general  fusion 
technique  based  upon  Chemoff  information.  The  generalization  shows  that  the  covariance  intersection 
technique  finds  a  density  function  that  is  the  log-linear  combination  of  two  initial  density  functions. 
Optimization  criteria  specify  an  optimal  point  on  the  chord  between  two  probability  density  functions  in  a 
log-probability  space  and  provide  an  “improved”  density  function.  Shannon  information  is  a  natural 
measure  for  the  selection  criterion  when  knowledge  of  the  number  of  measurements  is  unavailable  and 
the  two  density  functions  were  possibly  generated  from  a  substantial  subset  of  common  measurements. 

In  light  of  this  new  technique,  many  of  the  more  traditional  fusion  techniques  can  be  seen  to  be 
associated  with  estimating  probability  density  functions  appropriate  for  a  set  of  measurements.  With  the 
assumption  that  the  measurements  are  identical  and  independently  distributed,  additional  measurements 
significantly  restrict  the  set  of  probability  density  functions  that  describe  the  data.  This  generalized  fusion 
of  probability  density  functions  appears  to  be  a  new  technique  that  should  be  examined  further  to 
determine  its  range  of  applicability  and  its  relationship  to  other  known  probability  fusion  techniques. 
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